World Health Data Visualiation Project

This project explores the trends of world health data made for FIT5147: Data Exploration and Visualisation

Vinny Vu https://www.linkedin.com/in/vinny-vu-809bb1139/
09-18-2020

Introduction

In today’s it is common belief that we are becoming increasingly unhealthy because of the move in diets towards convenient often unhealthy foods and increased inactivity (“Physical Inactivity: A Global Public Health Problem” 2014). To assess this statement we will be looking into the following health factors, height, body mass index, blood pressure and cholesterol and their changes across time.

It is common belief that due to the better access to quality food, nutrients and wealth humans are growing taller over time and therefore it is of interest to analysis the changes in height data.

Per the World Health Organization (“Raised Blood Pressure” 2015) Hypertension or raised blood pressure is the condition in which the blood vessels have persistently raised pressure and the higher the pressure, the harder the heart must pump. Hypertension is a serious medical condition and can increase the risk of heart, brain, kidney, and other diseases being one of the major causes of premature deaths worldwide. Therefore, it is of interest to assess the trends of raised blood pressure data.

Body mass index or BMI for short, is a measure for indicating nutritional status in adults defined as a person’s weight in kilograms divided by the square of the person’s height in meters (kg/m2) (“Body Mass Index - Bmi” 2020). BMI was developed as a risk indicator of diseases, the higher the BMI is often resulted in a higher level of excessive body fat resulting the increased risk of some diseases such as premature death, cardiovascular diseases, blood pressure, osteoarthritis, some cancers and diabetes (“Body Mass Index - Bmi” 2020). Therefore, it is of interest to assess the trends in BMI levels.

Per betterhealth Health & Human Services (2014) Cholesterol is defined as a type of fat that is part of all animal cells that is essential for many of the body’s metabolic processes, including the production of hormones, bile and vitamin D. Specifically Low-density lipoprotein (LDL) cholesterol, carries most of the cholesterol that is delivered to cells and is often referred to as “bad” cholesterol as when its level in the bloodstream is high, it can clog up your arteries. High cholesterol leads to fatty deposits developing in the arteries which causes the vessel to narrow eventually becoming blocked leading to heart disease and stroke. Therefore, it is of interest to assess the trends in LDL cholesterol levels.

Specifically we will be analyzing world health data to answer the following questions:

Data Wrangling

Description of the data sources with links if available, the steps in data wrangling (including data cleaning and data transformations), and tools that you used.

Data for the analysis has been taken from the NCD Risk Factor COllaboration (NCD-RisC) a network of health scientist around the world that provides rigorous and timely data on risk factors for non-communicable diseases (NCDs) for 200 countries and territories (“High Cholesterol Is Responsible for About 3.9 Million Worldwide Deaths,” n.d.).

The data sets used include the following:

Body-Mas Index data This data set measures the yearly average national adult body-mass index for all countries from 1975 to 2016. The first 3 rows of the data set is shown below. The variables include the country, sex, year, mean BMI with the lower and upper 95% uncertainty intervals, and the prevalence of individuals falling into each BMI level with the lower and upper 95% uncertainty intervals. For the purposes of analysis we will only be looking at the mean BMI for each country.

Height data This data set measures the yearly mean height for adults (age 18) for all countries from 1896 to 1996 . The first 3 rows of the data set is shown below. The variables include the country, sex, year of birth, mean height in cm with the lower and upper 95% uncertainty intervals.

Blood Pressure data This data set measures the yearly mean blood pressure readings (systolic, diastolic and prevalence of raised blood pressure) for adults for all countries from 1975 to 2015. The first 3 rows of the data set is shown below. The variables include the country, sex, year, mean systolic blood pressure reading (in mmHg) with the upper and lower 95% uncertainty interval, mean diastolic blood pressure reading (in mmHg) with the upper and lower 95% uncertainty interval and the prevalence of raised blood pressure with the upper and lower 95% uncertainty interval.

Cholesterol data This data set measures the yearly mean cholesterol readings (total cholesterol, non HDL cholesterol, HDL cholesterol) for adults for all countries from 1980 to 2018. The first 3 rows of the ddata set is shown below. The variables include the country, sex, year, mean total cholesterol reading (in mmol.L) with the upper and lower 95% uncertainty interval, mean non HDL cholesterol reading (in mmol.L) with the upper and lower 95% uncertainty interval and the mean HDL cholesterol reading (in mmol.L) with the upper and lower 95% uncertainty interval.

For the analysis of question 3 we are interested in analyzing reading across each continent and developed/developing countries. To obtain which countries fall into each continent the r package countrycode was used and to obtain which countries fall into the developed/developing categories the list was obtained from the Australian Government Department of Foreign Affairs and Trade website. The developed/developing countries list was converted into excel form to allow importing into R. A left join was used to add the continent and wealth status to each data set. To compare the relationship between BMI with height, blood pressure and cholesterol all data sets were joined using an inner join.

Data Checking

The visdat R package was the main tool used in data checking Tierney (2017). After checking the NCD-RisC data for completeness it was found there was no missing data. As all the health data was compiled by NCD-RisC naming conventions of countries, country codes and years were all the same and therefore no modifications were needed to variables to allow joining data sets. However, the joined data used for the analysis in section 2 was reduced between the years of 1980-1996 being the only common years across all data sets.

Adding the continents with the countrycode package Arel-Bundock, Enevoldsen, and Yetman (2018) was able to be used without any missing values and therefore, manipulation of the input countries was not needed. However, naming conventions of countries between differ between the NDC-RisC data and the Developing/Developed country list obtained from Australian Government Department of Foreign Affairs and Trade website (“List of Developing Countries as Declared by the Minister for Foreign Affairs,” n.d.). The Developing/Developed country excel list was therefore modified to ensure same naming conventions between both data sets.

Data Exploration

Description of the data exploration process with details of the statistical tests and visualizations you used, what you discovered, and tools that you used.

To assess the trend in the world average height the height data set was used to calculate the average height across all countries for males and females to be used to develop the line graph in Figure 1.From the graph we can see there is an upward trend in average height that plateaus and slightly dips around 1975. The average difference in men and women height appear to be fairly constant each year being around 10-15cm.

To look at the trend in individual country average height the height data set was again used to produce a line graph, however, now grouped by country in Figure 2. From the graph we can see an overall upward in average country height with several countries stabilizing around 1975. Across several countries, however, we see a dip in average height in later years. The plot is very crowded though plotting 200 countries and is difficult to see individual trends. Sub-grouping of countries in section 4.3 improves this reducing the plotted lines for ease of analysis.

Figure 1: World average height line graph with year on the x-axis and average world average height in centermeters on the y-axis for men and women.

Figure 2: Country average height line graph with year on the x-axis and average height in centermeters for each country on the y-axis for men and women.

To assess the trend in the world average BMI levels the BMI data set was used to calculate the average BMI across all countries for males and females to be used to develop the line graph in Figure 3. For the purposes of analysis we will only be looking at the average BMI not the prevalence of each BMI level. From the graph we can see there is an upward trend in average BMI across all years. The average difference in men and women height appear to be fairly constant each year being around 5-8 kg/m2 with women being higher than men across all years.

To look at the trend in individual country average height the height data set was again used to produce a line graph, however, now grouped by country in Figure 4. From the graph we can see an overall upward trend in several countries. However, for several countries with high initial BMI an initial upward trend followed by dipping around 1995 can be seen. Several countries do see the opposite trend however, trending downwards each year, this is especially prominent in the women plot. The plot is very crowded though plotting 200 countries and is difficult to see individual trends. Sub-grouping of countries in section 4.3 improves this reducing the plotted lines for ease of analysis.

Figure 3: World average BMI line graph with year on the x-axis and average world average BMI in kilograms per meter square on the y-axis for men and women.

Figure 4: Country average BMI line graph with year on the x-axis and average BMI in kilograms per meter squared for each country on the y-axis for men and women.

To assess the trend in the world average blood pressure levels the blood pressure data set was used to calculate the average blood pressure for each type across all countries for males and females to be used to develop the line graph in Figure 5. From the graph we can see mean systolic and diastolic blood pressure remains fairly constant with a slight downward trend for women. We do see a larger downward trend in the prevalence of raised blood pressure however.

For the purposes of analysis we will only be looking at the prevalence of raised blood pressure. To look at the trend in individual country prevalence of raised blood pressure the blood pressure data set was again used to produce a line graph, however, now grouped by country in Figure 6. From this graph is it harder to assess the individual trends of each country. We do however, see an overall average downward trend. We do however see several countries with a downward trend, upward trend of all magnitudes.The plot is very crowded though plotting 200 countries and is difficult to see individual trends. Sub-grouping of countries in section 4.3 improves this reducing the plotted lines for ease of analysis.

World average Blood Pressure line graph with year on the x-axis and average world blood pressure readings in mmHg on the y-axis for men and women.

Figure 5: World average Blood Pressure line graph with year on the x-axis and average world blood pressure readings in mmHg on the y-axis for men and women.

Figure 6: Country prevelence of raised blood pressure line graph with year on the x-axis and prevalence of rasied blood pressure on the y-axis for men and women.

To assess the trend in the world average cholesterol levels the cholesterol data set was used. For purposes of analysis only the trend in non HDL or Low-density lipoprotein (LDL) cholesterol was assessed. To calculate the average non HDL cholesterol across all countries for males and females to be used to develop the line graph in Figure 7. From the graph we can see an upward trend from 1980 peaking in 1990 followed by a downward trend after. The levels between men and women appear fairly similar across all years.

To look at the trend in individual country non HDL cholesterol the cholesterol data set was again used to produce a line graph, however, now grouped by country in Figure 8. From this graph is it harder to assess the individual trends of each country. We do however see several countries with a downward trend, upward trend of all magnitudes .The plot is very crowded though plotting 200 countries and is difficult to see individual trends. Sub-grouping of countries in section 4.3 improves this reducing the plotted lines for ease of analysis.

Figure 7: World average non HDL cholesterol line graph with year on the x-axis and average world average non HDL cholesterol in mmol.L on the y-axis for men and women.

Figure 8: Country average non HDL cholesterol line graph with year on the x-axis and average non HDL cholesterol in mmol.L on the y-axis for men and women.

Is there a relationship between BMI and other health factors (Height, Blood Pressure and Cholesterol) and what is it?

Height and BMI relationship

To assess the relationship between height and BMI plots in Figure 9 shows on the left the regression line for BMI vs Height facet by gender and continent and the dot plot of all BMI and height with the linear regression line added on the right. Summary of the R squared value of the regression is shown in Table 1. From the plot we can see there doesn’t appear to be much of a relationship between height and BMI. This is further supported by the small R squared value shown in Table 1.

Height and BMI regression plot with mean BMI in kilograms per meter squared on the x-axis and mean height in cm on the y axis

Figure 9: Height and BMI regression plot with mean BMI in kilograms per meter squared on the x-axis and mean height in cm on the y axis

Table 1: Height and BMI linear regression R squared value
r.squared adj.r.squared
0.0370331 0.0368915

BP and BMI

To assess the relationship between the prevalence of raised blood pressure and BMI plots in Figure 10 shows on the left the regression line for BMI vs Prevalence of raised blood pressure facet by gender and continent and the dot plot of all blood pressure and height points with the linear regression line added on the right. Summary of the R squared value of the regression is shown in Table 2. From the plot there appears to be a weak positive relationship between BMI and raised blood pressure, however, this trend is not prominent in the Americas men plot. This is further supported by the small R squared value shown in Table 2.

Prevalence of raised blood pressure and BMI regression plot with mean BMI in kilograms per meter squared on the x-axis and prevalence of raised blood pressure on the y axis

Figure 10: Prevalence of raised blood pressure and BMI regression plot with mean BMI in kilograms per meter squared on the x-axis and prevalence of raised blood pressure on the y axis

Table 2: Prevalence of raised blook pressure and BMI linear regression R squared value
r.squared adj.r.squared
0.0410391 0.040898

Cholesterol and BMI

To assess the relationship between the mean non HDL cholesterol and BMI plots in Figure 11 shows on the left the regression line for mean BMI vs mean non HDL facet by gender and continent and the dot plot of all mean BMI and mean Cholesterol readings with the linear regression line added on the right. Summary of the R squared value of the regression is shown in Table 3. From the plot there appears to be a positive relationship between BMI. This trend is prominent across all the plots for men and women across all continents. We can see a stronger relationship compared to that of in Figure 9 and Figure 10 This is further supported by the larger R squared value shown in Table 3.

Mean non HDL cholesterol and BMI regression plot with mean BMI in kilograms per meter squared on the x-axis and mean non HDL cholesterol in mmol.L on the y axis

Figure 11: Mean non HDL cholesterol and BMI regression plot with mean BMI in kilograms per meter squared on the x-axis and mean non HDL cholesterol in mmol.L on the y axis

Table 3: Mean non HDL Cholesterol and BMI linear regression R squared value
r.squared adj.r.squared
0.4528376 0.4527571

To further expand the analysis done in section 4.1 we will look into further details of the trends between different continents and wealth levels. The main continents assessed for analysis is Africa, the Americas, Asia, Europe and Oceania obtained through the coutnrycode R package. Further countries have been broken up into developing vs developed categories with guidance from The Australian Government Department of Foreign Affairs and Trade.

In Figure 12 shows the trend in average height between different continents by gender. From the plot we can see there is a clear upward trend in average from 1900 to around 1960. From this point most continents begin to stabilize whereas Africa and Oceania appear to trend downwards. This trend is apparent across both men and women. For both men and women Europe dominates in average height followed by Oceania. Asia appears to have the lowest average across most years, however, for men the average overtakes Africa around 1980.

Trend and average height by continent with year on the x-axis and average height in cm

Figure 12: Trend and average height by continent with year on the x-axis and average height in cm

Figure 13 shows the trend in average height between wealth levels (developed and developing.) From the graph we can see developed countries dominate in average height across all years for both men in women. There is a clear upward trend in average height from 1900 to around 1960 across both genders and wealth levels. For the developed countries the average height appears to stabilize from 1960 onward. For the developing countries however, there appears to be a dip from 1960 onward increasing the gap between developed and developing countries.

Trend in Average Height Between Wealth Levels with year on the x-axis and average height in cm on the y-axis

Figure 13: Trend in Average Height Between Wealth Levels with year on the x-axis and average height in cm on the y-axis

BMI trend across continents and wealth levels

Figure 14 shows the trend in average BMI across each continent. From the plot we can see there is a clear upward trend in average BMI cross all continent for men and women. Oceania dominates with the highest average across all years for both genders. For men Africa has the lowest average, followed by Asia, then the Americas and Europe across all years. For women however, the upward trend for Europe is much smaller as we see the Americas average overtaking women in in the late 1980s.

Trend in Average BMI across continents with year on the x-axis and average BMI in kilograms per meter square on the y-axis.

Figure 14: Trend in Average BMI across continents with year on the x-axis and average BMI in kilograms per meter square on the y-axis.

Figure 15 shows the trend in average BMI across wealth levels. From the plot we can see a clear upward trend in BMI fro both genders and wealth levels with developed countries dominating with the higher average BMI across both genders for all years. The upward trend in BMI for men appear fairly similar between developed and developing countries maintaining a similar gap across all years. For women however, there is a sharper increase in average BMI for developing compared to developed countries closing in on the gap between the two averages.

Treand in Average BMI across Wealth Levels with year on the x-axis and average BMI in kilogram per meter square on the y-axis

Figure 15: Treand in Average BMI across Wealth Levels with year on the x-axis and average BMI in kilogram per meter square on the y-axis

Blood Pressure Trend Across Continents and Wealth Levels

Figure 16 shows the trend in the prevalence of raised blood pressure across continents. From the plot we can see a clear downward trend in raised blood pressure across both genders for all continents accept Africa which has an increasing trend from from 1975 to 1995 followed by a downward trend after. Europe begins with the highest level across both genders and the largest reduction. For men they are still at the highest level in 2015 but for women reaches the second lowest continent.

Trend in the prevalence of rasied blood pressure by continent with year on the x-axis and prevalance of raised blood pressure on the y-axis

Figure 16: Trend in the prevalence of rasied blood pressure by continent with year on the x-axis and prevalance of raised blood pressure on the y-axis

Figure 17 shows the trend in the prevalence of raised blood pressure by wealth level. From the plot we can see there is a clear downward trend in raised blood pressure across both genders and wealth levels. The downward trend is more prominent for developed countries. We can see the developing country line crossing over overtaking the developed countries for both genders at some point.

Trend in the prevalence of rasied blood pressure by wealth level with year on the x-axis and prevalance of raised blood pressure on the y-axis

Figure 17: Trend in the prevalence of rasied blood pressure by wealth level with year on the x-axis and prevalance of raised blood pressure on the y-axis

Cholesterol Trend Across Continents and Welath Levels

Figure 18 shows the trend in mean non HDL Cholesterol across continents. From the plot we can see a clear downward trend for both men and women for Europe and a clear upward trend for both genders for Africa. The trend for the other three continents appear fairly constant. We can see the mean non HDL cholesterol for European women crossing over all continents except Africa in the later years.

Trend in mean non HDL cholesterol across continents with year on the x-axis and mean non HDL cholesterol in mmol.L on the y-axis

Figure 18: Trend in mean non HDL cholesterol across continents with year on the x-axis and mean non HDL cholesterol in mmol.L on the y-axis

Figure 19 shows the trend in mean non HDL cholesterol across wealth level. From the plot we can see a clear downward trend for developed countries across both genders and upward trend for developing countries across both genders. The gap between developed and developing countries reduces reaching it’s lowest level in 2018.

Trend in mean non HDL cholesterol across wealth levels with year on the x-axis and mean non HDL cholesterol in mmol.L on the y-axis

Figure 19: Trend in mean non HDL cholesterol across wealth levels with year on the x-axis and mean non HDL cholesterol in mmol.L on the y-axis

Conclusion

Summary of what you learned from the data and how your data exploration process answered (or didn’t) your original questions.

From the analysis conducted we are able to analyze the the world health data to explore the trends in country and world height, BMI, blood pressure and cholesterol. In section 4.1 we can see there is an overall upward trend in world height, BMI but an overall downward trend in high blood pressure and cholesterol. However, due to the large amount of countries analyzed we were unable to assess individual country trends but look at use stacked line graphs to assess the overall trend and differences between countries. From section 4.2 we can see there is not a strong relationship between BMI and height or blood pressure, however, there appears to be a strong positive relationship between BMI and Cholesterol. In section 4.3 we were able to analyse the trends in health data between different continents and wealth levels. From the analysis we can see there is an overall lower wealth levels appear to record better health health readings having lower BMI, blood pressure and Cholesterol readings. However, over time lower wealth countries experience worsening reading whereas, higher wealth countries experience improvements in averages.

Reflection

From this analysis I was able to use to use Wickham et al. (2019), Auguie (2017), Arel-Bundock, Enevoldsen, and Yetman (2018), Zhu (2020) and Robinson, Hayes, and Couch (2020) to analyze the world health data using various line graphs, dot plots, regression lines and analyses using ordinary least squares. To further improve this report however, specific countries could have been chosen for analysis as comparing 200 countries on one graph did not reveal much information. Further, other models could be used to assess the relationship between BMI and other health factors could be used given the weak outputs obtained.

Arel-Bundock, Vincent, Nils Enevoldsen, and CJ Yetman. 2018. “Countrycode: An R Package to Convert Country Names and Country Codes.” Journal of Open Source Software 3 (28): 848. https://doi.org/10.21105/joss.00848.

Auguie, Baptiste. 2017. GridExtra: Miscellaneous Functions for "Grid" Graphics. https://CRAN.R-project.org/package=gridExtra.

“Body Mass Index - Bmi.” 2020. World Health Organization. World Health Organization. https://www.euro.who.int/en/health-topics/disease-prevention/nutrition/a-healthy-lifestyle/body-mass-index-bmi.

Health & Human Services, Department of. 2014. “Cholesterol.” Better Health Channel. Department of Health & Human Services. https://www.betterhealth.vic.gov.au/health/conditionsandtreatments/cholesterol.

“High Cholesterol Is Responsible for About 3.9 Million Worldwide Deaths.” n.d. RisC. http://ncdrisc.org/.

“List of Developing Countries as Declared by the Minister for Foreign Affairs.” n.d. DFAT. https://www.dfat.gov.au/about-us/publications/Pages/list-of-developing-countries-as-declared-by-the-minister-for-foreign-affairs.

“Physical Inactivity: A Global Public Health Problem.” 2014. World Health Organization. World Health Organization. https://www.who.int/dietphysicalactivity/factsheet_inactivity/en/.

Robinson, David, Alex Hayes, and Simon Couch. 2020. Broom: Convert Statistical Objects into Tidy Tibbles. https://CRAN.R-project.org/package=broom.

Tierney, Nicholas. 2017. “Visdat: Visualising Whole Data Frames.” JOSS 2 (16): 355. https://doi.org/10.21105/joss.00355.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.

Zhu, Hao. 2020. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.